本文介绍了DCT-NET,这是一种新颖的图像翻译体系结构,可用于几张肖像风格。给定有限的样式示例($ \ sim $ 100),新的体系结构可以产生高质量的样式转移结果,具有先进的能力,可以合成高保真内容和强大的一般性来处理复杂的场景(例如,遮挡和配件)。此外,它可以通过一个由部分观察(即风格化的头)训练的优雅评估网络启用全身图像翻译。几乎没有基于学习的样式转移是具有挑战性的,因为由于仅由少数几个培训示例形成的偏见分布,学到的模型很容易在目标域中过度拟合。本文旨在通过采用“首先校准,稍后翻译”的关键思想来应对挑战,并以本地注重的翻译探索增强的全球结构。具体而言,所提出的DCT-NET由三个模块组成:一个内容适配器从源照片借用功能的先验来校准目标样本的内容分布;使用仿射变换来释放空间语义约束的几何扩展模块;以及通过校准分布产生的样品的质地翻译模块学习细粒的转换。实验结果证明了所提出的方法在头部风格化方面具有优势及其对具有自适应变形的完整图像翻译的有效性。
translated by 谷歌翻译
Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction. At the core of OFASys is the idea of decoupling multi-modal task representations from the underlying model implementations. In OFASys, a task involving multiple modalities can be defined declaratively even with just a single line of code. The system automatically generates task plans from such instructions for training and inference. It also facilitates multi-task training for diverse multi-modal workloads. As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data. The single OFA+ model achieves 95% performance in average with only 16% parameters of 15 task-finetuned models, showcasing the performance reliability of multi-modal task-scaling provided by OFASys. Available at https://github.com/OFA-Sys/OFASys
translated by 谷歌翻译
Human civilization has an increasingly powerful influence on the earth system. Affected by climate change and land-use change, natural disasters such as flooding have been increasing in recent years. Earth observations are an invaluable source for assessing and mitigating negative impacts. Detecting changes from Earth observation data is one way to monitor the possible impact. Effective and reliable Change Detection (CD) methods can help in identifying the risk of disaster events at an early stage. In this work, we propose a novel unsupervised CD method on time series Synthetic Aperture Radar~(SAR) data. Our proposed method is a probabilistic model trained with unsupervised learning techniques, reconstruction, and contrastive learning. The change map is generated with the help of the distribution difference between pre-incident and post-incident data. Our proposed CD model is evaluated on flood detection data. We verified the efficacy of our model on 8 different flood sites, including three recent flood events from Copernicus Emergency Management Services and six from the Sen1Floods11 dataset. Our proposed model achieved an average of 64.53\% Intersection Over Union(IoU) value and 75.43\% F1 score. Our achieved IoU score is approximately 6-27\% and F1 score is approximately 7-22\% better than the compared unsupervised and supervised existing CD methods. The results and extensive discussion presented in the study show the effectiveness of the proposed unsupervised CD method.
translated by 谷歌翻译
Active learning with strong and weak labelers considers a practical setting where we have access to both costly but accurate strong labelers and inaccurate but cheap predictions provided by weak labelers. We study this problem in the streaming setting, where decisions must be taken \textit{online}. We design a novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy. Prior active learning algorithms with access to weak labelers learn a difference classifier which predicts where the weak labels differ from strong labelers; this requires the strong assumption of realizability of the difference classifier (Zhang and Chaudhuri,2015). WL-AC bypasses this \textit{realizability} assumption and thus is applicable to many real-world scenarios such as random corrupted weak labels and high dimensional family of difference classifiers (\textit{e.g.,} deep neural nets). Moreover, WL-AC cleverly trades off evaluating the quality with full exploitation of weak labelers, which allows to convert any active learning strategy to one that can leverage weak labelers. We provide an instantiation of this template that achieves the optimal query complexity for any given weak labeler, without knowing its accuracy a-priori. Empirically, we propose an instantiation of the WL-AC template that can be efficiently implemented for large-scale models (\textit{e.g}., deep neural nets) and show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
translated by 谷歌翻译
无源的无监督域适应性(SFUDA)旨在使用未标记的目标数据和训练有素的源域模型来学习目标域模型。大多数先前的SFUDA都致力于根据源知识推断目标数据的语义。在不衡量源知识的可传递性的情况下,这些方法不足以利用源知识,并且无法识别推断的目标语义的可靠性。但是,现有的可传递性测量需要源数据或目标标签,而SFUDA中是不可行的。为此,首先,我们提出了一种新颖的不确定性诱导的可传递性表示(UTR),该表示在没有源数据和目标标签的情况下,它利用不确定性作为工具来分析源编码的通道可传递性。域级UTR揭开了编码器通道向目标域的可传输程度,实例级别的UTR表征了推断的目标语义的可靠性。其次,基于UTR,我们为SFUDA提出了一个新颖的校准自适应框架(CAF),包括i)源知识校准模块,该模块指导目标模型学习可转移的源知识并丢弃不可转移的源知识,并且II)校准不可靠语义的目标语义校准模块。在校准的源知识和目标语义的帮助下,该模型可以安全地适应目标领域。我们使用实验结果验证了方法的有效性,并证明所提出的方法在三个SFUDA基准上实现了最先进的性能。代码可在https://github.com/spiresearch/utr上找到。
translated by 谷歌翻译
现有域适应方法假设域差异是由一些离散属性和变化引起的很少的离散属性。因此,我们建议研究一个新问题,即通过连续变化的属性形成无限结构域的晶状体连续域适应(CDA)。利用两个标记的源域和几个观察到的未标记目标域数据的知识,CDA的目的是学习具有连续属性的整个数据分布的通用模型。除了提出新问题的贡献外,我们还提出了一种新颖的方法作为强大的CDA基线。具体而言,首先,我们提出了一种新颖的交替训练策略,以减少多个领域之间的差异,同时概括为看不见的目标域。其次,在估计跨域差异测量时,我们提出了连续性约束。最后,为了使差异与迷你批量大小相结合,我们设计了一个特定领域的队列,以维护源域的全局视图,从而进一步提高了适应性性能。事实证明,我们的方法可以使用广泛的实验实现CDA问题的最新问题。该代码可在https://github.com/spiresearch/cda上找到。
translated by 谷歌翻译
最近,分布(OOD)的概括引起了人们对基于深度学习模型的鲁棒性和概括能力的关注,因此,已经制定了许多策略来解决与此问题相关的不同方面。但是,大多数现有的OOD概括算法都是复杂的,并且专门为某些数据集设计。为了减轻此问题,Nicochallenge-2022提供了Nico ++,这是一个具有不同上下文信息的大型数据集。在本文中,基于对NICO ++数据集的不同方案的系统分析,我们通过偶联的技巧提出了一个简单但有效的学习框架,包括多目标框架设计,数据增强,培训,培训和推理策略。我们的算法是记忆效率且易于安装的,没有复杂的模块,并且不需要大型预训练模型。它在公共测试集中获得了88.16%的前1位精度,在私人测试集中获得了75.65%的表现,并在域Nicochallenge-2022的域概括任务中排名第1。
translated by 谷歌翻译
肌肉骨骼和神经系统疾病是老年人行走问题的最常见原因,它们通常导致生活质量降低。分析步行运动数据手动需要训练有素的专业人员,并且评估可能并不总是客观的。为了促进早期诊断,最近基于深度学习的方法显示了自动分析的有希望的结果,这些方法可以发现传统的机器学习方法中未发现的模式。我们观察到,现有工作主要应用于单个联合特征,例如时间序列的联合职位。由于发现了诸如通常较小规模的医疗数据集的脚之间的距离(即步幅宽度)之类的挑战,因此这些方法通常是优选的。结果,我们提出了一种解决方案,该解决方案明确地将单个关节特征和关节间特征作为输入,从而使系统免于从小数据中发现更复杂的功能。由于两种特征的独特性质,我们引入了一个两流框架,其中一个流从关节位置的时间序列中学习,另一个从相对关节位移的时间序列中学习。我们进一步开发了一个中层融合模块,以将发现的两个流中发现的模式结合起来进行诊断,从而导致数据互补表示,以获得更好的预测性能。我们使用3D骨架运动的基准数据集涉及45例肌肉骨骼和神经系统疾病的患者,并实现95.56%的预测准确性,效果优于最先进的方法,从而验证了我们的系统。
translated by 谷歌翻译
对于诊断各种疾病的诊断,对睡眠阶段进行分类至关重要。但是,现有的自动诊断方法主要采用“金标准”局部脑图(EEG)或医院中多摄像机仪(PSG)机器的其他单型模式传感信号,这些信号昂贵,导入且因此不适合保健点监测在家。为了在家中启用睡眠阶段监控,我们在本文中分析了红外视频与脑电图信号之间的关系,并提出了一项新任务:通过将有用的知识从EEG信号提炼到视觉视频,使用红外视频对睡眠阶段进行分类。为了为该应用程序建立可靠的跨模式基准,我们开发了一个新的数据集,称为通过红外视频和脑电图($ s^3ve $)看到您的睡眠阶段。 $ s^3ve $是一个大型数据集,包括用于睡眠阶段分类的同步红外视频和脑电图信号,包括105个主题和154,573个视频剪辑,长度超过1100小时。我们的贡献不仅限于数据集,而且还涉及一种新型的跨模式蒸馏基线模型,即结构感知的对比度蒸馏(SACD),以将脑电图知识提升为红外视频特征。 SACD在我们的$ S^3ve $和现有的跨模式蒸馏基准上都达到了最先进的表演。基准方法和基线方法都将被释放给社区。我们希望在睡眠阶段分类中提高更多注意力并促进更多的发展,更重要的是,从临床信号/媒体到传统媒体的跨模式蒸馏。
translated by 谷歌翻译
简介:血管可以从数字眼底图像(DFI)中可视化。几项研究表明,从DFI获得的心血管风险与血管特征之间存在关联。计算机视觉和图像分割的最新进展使自动化DFI血管分割。需要从这些分段DFI中自动计算数字脉管生物标志物(VBM)的资源。方法:在本文中,我们引入了Python Vasculature生物标志物工具箱,表示为PVBM。总共实施了11个VBM。特别是,我们引入了新的算法方法来估计曲折和分支角度。使用PVBM和作为可用性的证明,我们分析了青光眼患者和健康对照组之间的几何血管差异。结果:我们基于DFI分割构建了一个全自动的血管生物标志物工具箱,并提供了表征青光眼的血管变化的可用性证明。对于小动脉和静脉,与健康对照组相比,青光眼患者的所有生物标志物都显着且较低,除了曲折度,静脉奇异长度和静脉分支角度。结论:我们已经从视网膜血管分割中对11个VBM进行了自动化。 PVBM工具箱是根据GNU GPL 3许可证的开源,可在Physiozoo.com(发布之后)上找到。
translated by 谷歌翻译